feat(lifecycle): ACT layer — reaction table + escalation engine (split B)#6
Conversation
…t B) Add the ACT half of the LCM: map persisted status transitions to reactions (send-to-agent / notify / auto-merge) and drive escalation. - reactions.go: the §4.2 default reaction table, reactionEventFor (mirrors DeriveLegacyStatus for the ACT layer), in-memory per-(session,reaction) escalation trackers, the react() dispatch chokepoint, and a real TickEscalations for duration-based escalations the synchronous LCM can't wake itself for. auto-merge action exists but is off by default; bugbot-comments/merge-conflicts are configured but dormant (no decide-core producer yet). - manager.go: mutate now returns a transition; each Apply* path fires the mapped reaction after persist via the single synchronous react() seam. OnKillRequested intentionally does not react (explicit kill != inferred event). Split-A load->decide->diff->persist behavior is unchanged. ci-failed budget is persistent across fail->pending->fail oscillation; non-persistent trackers reset when the status leaves the triggering state. Escalation silences further auto-dispatch until the condition clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshitsinghbhandari
left a comment
There was a problem hiding this comment.
Review — ACT layer (split B), at 8b8da8e
Verified independently: gofmt clean, build/vet pass, go test -race green at 89.2% coverage. The reaction table matches §4.2, the wiring is clean (CI log tail threaded into the SCM reaction; OnKillRequested correctly discards its transition so an explicit kill fires nothing; react() runs outside the per-session lock so a busy-waiting send-to-agent doesn't hold the session mutex), and the synchronous-single-chokepoint design is well-documented. Two findings.
1. Persistent ci-failed tracker can leak / never clear (fix before merge)
react() only clears the tracker for beforeKey, and for a persistent reaction only when incidentOver(afterLC) holds at the moment it leaves that reaction. So the clear never happens if the session leaves ci-failed to another open-PR state first:
ci-failed → approved → merged
└ leaving ci-failed: incidentOver(approved)=false (PR still open) → NOT cleared
leaving approved: only the approved tracker is considered
⇒ the ci-failed tracker is never cleared → leaks
Two consequences:
- Memory leak: one orphaned
reactionTrackerper session that had aci-failedthen recovered before merging. - Stale silencing: if such a tracker already hit
escalated=true, a later regressionapproved → ci-failedre-enterssendToAgent, seesescalated, and is silenced — the agent is never re-nudged and no fresh budget is granted, even though the prior incident effectively ended.
Suggested fix: when a transition reaches incidentOver(afterLC) (PR no longer open / session terminal), clear all trackers for that session, not just beforeKey. That also forces a decision worth making explicit: does a genuine recovery to approved/green count as the incident ending (re-arm the budget), or only a merge/close? The persistent flag was for fail→pending→fail flap — confirm it shouldn't also reset on a real recovery.
2. react() runs outside the per-session lock → unserialized dispatch (integration-time)
Persists are serialized under keyedMutex, but react() fires after the lock releases, so for a live daemon with concurrent observers (SCM poller + reaper + activity ingest) reactions for one session can interleave / run on a stale afterLC snapshot — e.g. a ci-failed send-to-agent dispatching after the session already moved to approved. Not observable in the current single-threaded tests, and the out-of-lock placement is the right call to avoid holding the mutex during a busy-wait. Flag for the integration step: either re-verify the session is still in the triggering state before dispatching, or give react() a per-session ordering (a small react queue). Worth a code comment now so it isn't forgotten.
Minor
- Notify reactions have no cross-re-entry dedup: an
approved → ci-failed → approvedoscillation re-fires "ready to merge" each time it re-enters approved. Probably fine, but noisy under flap. agent-stuck's §4.2threshold: 10misn't implemented (notify fires immediately on entering stuck). Likely redundant given thedetecting→stuckdebounce already gates it — fine to skip, just noting the divergence.
Recommendation: Approve once #1 is addressed (it's a real bug in the escalation engine, the core of this PR); #2 as a tracked integration-time item; minors optional. Nice work on the chokepoint design and the lock-free TickEscalations.
review) Address review finding #1: the persistent ci-failed tracker leaked and could stale-silence a future regression. It was only cleared when leaving the ci-failed reaction AND incidentOver held at that moment — so a recovery to another open-PR state (ci-failed -> approved -> merged) never cleared it. - react() now clears ALL of a session's trackers when the state REACHED is incident-over (PR resolved / session terminal) OR a genuine recovery (approved/mergeable, which the open-PR ladder guarantees means CI is no longer failing). Keyed on the state reached, not the one left, since the recovery transition is typically review_pending->approved (empty beforeKey). - Persistent ci-failed still survives the ambiguous review_pending limbo, so fail->pending->fail keeps one shared budget (§4.2). - Document the out-of-lock react() dispatch caveat for the daemon integration step (review #2) and the intentionally-skipped agent-stuck 10m threshold. Tests: re-arm after a genuine recovery (regression re-nudges, not silenced); all session trackers cleared once the incident is over. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the careful review — pushed #1 (the real bug) — fixed. You're right, and the trace was spot-on. My original clear was keyed on
#2 (out-of-lock dispatch) — documented as a tracked integration item. Added a comment on Minors:
Gates green after the fix: gofmt clean, build, vet, |
harshitsinghbhandari
left a comment
There was a problem hiding this comment.
Re-review — b2161d5
Verified independently: gofmt clean, build/vet pass, go test -race green at 89.3% coverage. Finding #1 is correctly resolved, finding #2 documented.
#1 — persistent-tracker lifecycle ✅ The clear is now keyed on the state reached (incidentOver || recovered) and clears all of the session's trackers there, with recovered scoped to unambiguous green (approved/merge_ready) — which, since the open-PR ladder ranks ci_failing above approval, can't coexist with failing CI. This is the right call and resolves the open question well:
ci-failed → approved → mergedno longer leaks the ci-failed tracker (cleared on reaching approved).- a regression
approved → ci-failedre-arms with a fresh budget instead of staying silenced. - the
fail → pending → failflap still drains one budget —review_pending/pr_openare neither incident-over nor recovered, so the persistent tracker survives the limbo. - covered by
TestReaction_CIFailedRearmsOnGenuineRecoveryandTestReaction_IncidentOverClearsAllSessionTrackers.
#2 — out-of-lock dispatch ✅ Documented as an integration-time caveat on react() with the concrete options (per-session react queue / re-check before dispatch). Good enough until the daemon runs concurrent observers.
agent-stuck threshold — documented why it's intentionally skipped (upstream detecting→stuck debounce). Agreed.
LGTM — approving. Clean turnaround.
There was a problem hiding this comment.
Pull request overview
Implements the Lifecycle Manager ACT layer, adding reaction dispatch and escalation behavior after persisted lifecycle transitions.
Changes:
- Adds the default reaction table, reaction mapping, tracker-based escalation, and
TickEscalations. - Wires
Apply*paths to return persisted transitions and invokereact()after mutation. - Adds reaction-focused tests and replaces the prior no-op escalation test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
backend/internal/lifecycle/reactions.go |
Adds reaction definitions, dispatch, escalation tracking, and duration tick handling. |
backend/internal/lifecycle/reactions_test.go |
Adds tests for reaction mapping, escalation, tracker clearing, and tick behavior. |
backend/internal/lifecycle/manager.go |
Wires lifecycle mutations to post-persist reactions and initializes tracker state. |
backend/internal/lifecycle/manager_test.go |
Removes the obsolete no-op TickEscalations test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…escalation (Copilot review) - OnKillRequested now clears the session's escalation trackers after a successful kill, so a later duration-based TickEscalations can't emit reaction.escalated for a dead session (dispatch is still skipped). - sendToAgent rolls back the attempt (and firstAttemptAt when it set it) on a messenger.Send error, so undelivered messages don't march a reaction toward escalation — honoring "send failures retry next tick" (§4.3). - Duration escalation now uses an inclusive boundary (>=) in both shouldEscalate and TickEscalations, so a 30m reaction escalates at exactly 30m instead of waiting for the next tick. Tests: kill clears trackers + no post-kill escalation; repeated failed delivery never escalates; duration escalation fires at exactly escalateAfter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshitsinghbhandari
left a comment
There was a problem hiding this comment.
Re-review — 5081cf7 (Copilot fixes)
Verified the delta over the LGTM'd b2161d5; gofmt clean, build/vet pass, go test -race green at 89.3%. All five Copilot fixes are sound:
- OnKillRequested clears the session's trackers — correct; a kill is terminal but fires no reaction, so without this a leftover tracker could later emit
reaction.escalatedfor a dead session viaTickEscalations. Covered byTestReaction_KillClearsEscalationTrackers. - Send-error attempt rollback — decrements
attempts(and resetsfirstAttemptAtonly when this call set it, viafreshFirst) undertrackerMu, so an undelivered message doesn't march toward escalation (§4.3). Covered byTestReaction_SendFailureDoesNotBurnBudget. - Inclusive
>=boundary — applied consistently in bothshouldEscalateandTickEscalations, so escalation fires at exactlyescalateAfterinstead of waiting a tick. - Out-of-lock dispatch note — = my finding #2, documented and deferred to daemon integration.
Still LGTM. Good to merge.
…viewer to worker harness Per PR #197 review feedback: - Reviewer agent posts its review to the PR itself, so remove the ports.PRReviewPoster port, the GitHub review poster, the submit HTTP route + DTO, and the service Submit method (#1, #4, #7). - Trigger spawns the reviewer agent over the worker's worktree with its own review prompt, mirroring the session launch flow (resolve agent by harness -> argv -> runtime.Create) (#8, #9). - Default reviewer harness reuses the worker's harness when supported, falling back to claude-code; reviewer config stays independent of the worker override (#5, #6). - Drop the `ao review` CLI for this PR's scope (#2, #3). Regenerated OpenAPI + TS types. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(review): configurable AO code review backend (V1)
Add per-project configurable code review of a worker's PR. A reviewer
agent runs one-shot over the worker's own worktree and posts its result
to the PR; the worker picks the feedback up through the existing SCM
observer review-nudge path.
- domain: ProjectConfig.reviewers (+ default reviewer harness), Review /
ReviewRun types and verdict/status vocab.
- storage: review + review_run tables (0011), sqlc queries, store methods.
- service/review: rewrite the in-memory stub as a persisted ReviewService
(Trigger/Submit/List) with a reviewer Runner over agent resolver +
runtime; ports.PRReviewPoster implemented on the GitHub adapter.
- http: session-scoped routes POST /sessions/{id}/reviews/trigger,
POST .../submit, GET .../reviews; regenerated OpenAPI + TS types.
- cli: ao review trigger|submit|list.
- frontend: adapt ReviewDashboard to the per-worker reviews API.
Closes #192
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): address review — drop submit/poster/CLI, default reviewer to worker harness
Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
own review prompt, mirroring the session launch flow (resolve agent by
harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
falling back to claude-code; reviewer config stays independent of the
worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).
Regenerated OpenAPI + TS types.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(review): restore ao review submit (records verdict+body in AO)
Per maintainer request, bring back `ao review submit`. AO records the
reviewer's verdict and body on the review_run and marks the pass complete;
it does not post to GitHub — the reviewer agent posts its review to the PR
itself.
- storage: add review_run.body (0011), persist via Insert/UpdateReviewRunResult.
- service: restore Submit (no SCM poster) storing verdict + body.
- http: restore POST /sessions/{id}/reviews/submit + SubmitReviewInput.
- cli: ao review submit [worker] --verdict --body (worker from arg/--session/$AO_REVIEW_WORKER).
- runner: reviewer prompt instructs posting to GitHub and recording via ao review submit.
Regenerated OpenAPI + TS types.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): move reviewer runner to its own package; sharpen prompt
Per PR #197 review:
- Move the concrete reviewer runner out of the service layer into a new
internal/review_runner package (package reviewrunner), beside other
orchestration packages like session_manager. The service keeps only the
Runner interface + RunSpec it depends on; the agent-resolver + runtime
launch flow lives in review_runner.
- Sharpen the reviewer prompt: tell the agent to diff against the PR base,
focus on high-confidence findings, post via `gh pr review`, and record
the result with `ao review submit`; review-only (no commits/edits).
- Add unit tests for the runner.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): simplify review_run schema; provider-agnostic reviewer prompt
Per PR #197 review:
- review_run: status default 'running' (drop 'pending'), drop CHECK
constraints on status/verdict, drop the updated_at column and the
session/iteration index. Propagated through queries, domain, store,
service, and tests.
- Reviewer prompt no longer hardcodes GitHub/gh commands — it instructs the
agent to use whatever review tooling the provider offers, keeping the
flow extensible across SCM providers.
Regenerated sqlc + OpenAPI/TS.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): launch reviewer before persisting the run
Trigger now spawns the reviewer agent first and then writes the review_run
with a status derived from the launch outcome (running on success, failed
if it never started), instead of inserting a running row and correcting it
to failed afterwards.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): pluggable reviewer registry distinct from worker harnesses
Reviewers are now their own pluggable adapter set, separate from the worker
agent registry — adding a reviewer (claude-code today, greptile tomorrow) is
a one-line registration that does not widen the worker harness vocabulary,
and a worker harness does not automatically become a valid reviewer.
- domain.ReviewerHarness: a distinct vocabulary (AllReviewerHarnesses) with
its own IsKnown; ReviewerConfig/Review/ReviewRun use it. ResolveReviewerHarness
reuses the worker harness only when it is itself a supported reviewer, else
falls back to claude-code.
- ports.Reviewer: a reviewer-specific contract (ReviewCommand → argv + env)
that models one-shot / non-prompt CLIs natively instead of forcing every
reviewer through the worker's interactive GetLaunchCommand(Prompt:...).
- internal/adapters/reviewer: a separate registry + resolver (mirrors the
worker agent registry) with the claude-code reviewer adapter, which owns the
review prompt and reuses the worker claude-code launch construction.
- review_runner resolves via the reviewer registry (not the worker
AgentResolver) and merges AO_REVIEW_WORKER into the adapter's env.
- daemon wires the reviewer resolver. Registry/domain parity is test-enforced.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(review): cover run-scoped reviewer submit
* fix(api): update generated review submit schema
* refactor(review): split core engine (internal/review) from API service
Move the review orchestration (Trigger/Submit/List, run-id generation,
deps, RunSpec/Runner, sentinels) into a transport-independent core package
internal/review (Engine). internal/service/review is now a thin API-flow
boundary: the controller-facing Manager interface + a Service that delegates
to the engine + error re-exports.
This keeps the service layer to API concerns and lets the same engine back a
future in-process CLI trigger without going through HTTP. review_runner now
depends on the core package; daemon builds the engine and wraps it in the
service. No API/schema changes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(review): commit-aware trigger, reviewer handle for UI, no env vars
Reworks the review trigger lifecycle and drops env-based coupling:
- review_run gains target_sha (the reviewed commit) and drops iteration.
A repeat trigger for the same PR head short-circuits to the existing run.
- review gains reviewer_handle_id: the live reviewer pane's runtime handle,
reused across passes and exposed in the reviews API so the UI can attach
its terminal over /mux.
- Trigger flow: if a live reviewer pane exists and a new commit arrived,
message it to re-review; otherwise spawn a fresh reviewer. The run is
recorded only after the reviewer is launched.
- No environment variables: the reviewer adapter embeds the explicit
`ao review submit --session <w> --run <id>` command in the spawn prompt
and the re-review message. CLI submit requires --run/--session (no env
fallbacks).
- Merge review_runner into internal/review as a Launcher (spawn/notify/alive).
- Trigger returns 201 for a new pass, 200 when reusing an existing run.
Regenerated sqlc + OpenAPI/TS.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): author the reviewer prompt centrally, not in the adapter
Mirror the worker model (session_manager builds the prompt; adapters just
place it via LaunchConfig.Prompt). The reviewer prompt now lives in
internal/review/prompt.go and is passed through ports.ReviewInvocation.Prompt;
the claude-code reviewer adapter just feeds inv.Prompt to its launch command
and returns it as the re-review message. One-shot CLI reviewers may ignore it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(review): split reviewer prompt into system+task, mirroring buildSpawnTexts
Mirror session_manager.buildSpawnTexts for the reviewer: a standing role goes
in the system prompt, the per-pass task (PR/commit + exact `ao review submit`
command) goes in the user prompt. internal/review/prompt.go now returns
(prompt, systemPrompt); both flow through ports.ReviewInvocation and the
claude-code adapter places them via LaunchConfig{Prompt, SystemPrompt}. The
re-review message reuses the per-pass prompt (role already established in the
running pane).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Vaibhaav <user@example.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Summary
Implements split B of the Lifecycle Manager: the ACT layer that turns persisted status transitions into reactions and drives escalation. Builds on split A's synchronous
Apply*pipeline (load→decide→diff→persist).reactions.go) — the full distillation §4.2 default:ci-failed(send, retries 2, persistent),changes-requested/bugbot-comments(send, 30m),merge-conflicts(send, 15m),agent-idle(send, retries 2 + 15m),approved-and-green(notify — human decides to merge),agent-stuck/agent-needs-input/agent-exited(notify urgent),pr-closed(notify action),all-complete(notify info).auto-mergeaction kind exists but off by default (no row uses it).bugbot-comments/merge-conflictsare configured but dormant — split-A's decide core has no producer for them yet.reactionEventFor— maps a canonical record → reaction key, mirroringDeriveLegacyStatus. Closed-PR is detected off the PR axis (it derives toidle).reactionTracker{attempts, escalated, firstAttemptAt}keyed by(sessionID, reactionKey). Numeric cap or duration → emitreaction.escalated+ notify + silence further auto-dispatch. Send failures don't burn budget. Trackers clear on leaving the state, exceptci-failed(persistent acrossfail→pending→fail; resets only when the PR leaves open / session terminal).TickEscalations(now)— fires duration escalations the synchronous LCM can't wake itself for; never writes canonical state.manager.go, additive) —mutatereturns a*transition; eachApply*path fires the mapped reaction after persist via the single synchronousreact()chokepoint (the seam that keeps the async move localized later).OnKillRequestedintentionally does not react (explicit kill ≠ inferred event). Split-A behavior unchanged.Design decisions (agreed with @HARSHIT)
react()chokepoint — the reaper/daemon that would own a worker goroutine is out of this split; making dispatch async later is a one-function change.Test plan
gofmt -l .cleango build ./...go vet ./...go test -race ./...TickEscalations, non-persistent tracker clearing, pr-closed/merged, agent-exited on inferred death, explicit kill fires nothing.🤖 Generated with Claude Code